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@ A method of analyzing a source image to 
separate text from graphics, by: (a) scanning 
and digitizing the source image to obtain a 
binary Image including black and white 
objects; (b) filtering out the noise from the 
binary image ; (c) extracting the contours there- 
from of the blade objects and the white objects ; 
(d) evaluating inclusion relationships between 
the objects, and generating a tree-like structure 
of such relationships ; (e) utilizing the contours 
for measuring the objects to obtain the shape 
properties of each object; (f) effecting classifi- 
cation of the objects as graphics or text accord- 
ing to the measured shape properties and then 
generating tree-like structure of the incluston 
relattonshlps ; (g) and utilizing the source Im- 
age and the dassification of the objects for 
generating outputs representing graphics and 
text, respectively. 
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BACKGROUND OF THE INVENTION 

The present invention relates to a method of analyzing documents or ther source innages in order to dis- 
criminate between t xt and graphics, and thereby to separate text from graphics. 

Discrimination between text and graphics is frequently essential when processing documents. For exam- 
ple, some docunwnt processing applications are interested only in graphics (or text). Other document proc- 
essing applications apply different processes to text and graphics and therefore have to segment the image 
into regions of text, graphics and half-tone. 

All applications discriminating between text and graphics require a definition distinguishing between the 
two. Some define text as characters grouped in strings, whereas characters which appear isolated are consid- 
ered as graphics. Others define text as characters wherever they appear, regardless of font or size. The latter 
definition appears more appropriate but results in misclassifications; for example, a circle might be misclassified 
as the character "o". Whichever definition is used, most algorithms proposed in the literature do not perform 
true character recognition, which is far more expensive, but rather use simple hueristics for classification. 

There are two principal approaches by which text is discriminated from graphics: "top-down" and "bottom- 
up". In the "top-down" approach, the image is first divided into major regions which are further divided into sub- 
sequent regions. In the "bottom-up" approach, the image is first processed to determine the individually con- 
nected components. These components, when identified as characters, are grouped into words, words Into sen- 
tences, and so on. The top-down approach is knowledge based. It is suitable only for Images which are com- 
posed of strictiy separated regions of text and graphics. Text words which lie within graphic regions are clas- 
sified as graphics. The bottom-up approach, on the other hand, is more reliable but time consuming. Therefore, 
the two approaches should be used in conjunction; first a top-down method will detect the graphics regions, 
and then a bottonn-up method will detect the text within these regbns. 

The run-length smearing algorithm (RLSA) is an example of a top-down method. This algorithm segments 
and labels the image into major regions of text lines, graphics and half-tone images. The algorithm replaes O's 
by 1's if the number of adjacent O's is less than a predefined threshold (O's correspond to white pixels and Vs 
correspond to black pixels). This one dimensional operatton is applied line-by-l ine as well as column-by-column 
to the two dimensional bitmap image. The two results are then combined by applying local AND to each pixel 
location. The resulting image contains black blocks wherever printed materia! appears on the original image 
producing an effect of smearing. The blocks are then labeled as text lines, graphics or half-tone images using 
statistical pattern classification (for example - number of black pixels in block, number of horizontal white/black 
transitions). 

The RLSA algorithm is fast but is restricted to a certain class of images. No skewed text lines are allowed 
in these images, and the dimensions of characters must fit the predefined threshold parameters; othenvise, 
characters wOI remain isolated (if parameters are too small) or text lines will get combined (if parameters are 
too big). 

After rough classification is received by a "top-down" algorittim, the graphic blocks are further processed 
by a "bottom-up" algorithm to obtain a detailed classification. Bottom-up algorithms start with a process to de- 
termine the individually connected components. Several algoritiims are known vi^lch perform connected com- 
ponents detection. These algoritfims can be combined into chain code generation algorithms in order to extract 
as much infonmation as possible during one raster scan over ttie image. Such "combined" algorithm can operate 
fasten a run-length fonmated image (run time is proportional to the numberof "runs" in the image which is rough- 
ly proportional to tiie length of boundaries in the image). At the end of such process, the following raw infor- 
mation is available for each connected component: (a) area (numberof pbcels forming the connected compo- 
nent); (2) chain code description of boundaries (a chain for each boundary); and (3) klentification of the en- 
closing connected component, and of the enclosed connected components. 

This raw information can be further processed to resolve other properties: (4) enclosing rectangle; (5) Euler 
number (Euler number=1 - number of holes in shape); (6) perimeter length (total length of boundaries); and (7) 
hull area. 

More shape properties other than those of (4)-(7) can be resolved from the infonnation of (1 )-(3), but prop- 
erties (4)-(7) are most valuable for discrimination of character symbols in a minimum effort. The Euler number 
is avaOable with no additional effort (Euler number=2- number of chains). The enclosing rectangle can be com- 
puted in one scan over the chains. Perimeter length equals roughly the total number of links in the chain code. 
Better estimation can be obtained in other mettiods, but this estimation is fairiy good. The hull area can be com- 
puted by first finding tiie convex-hull polygon, and then finding the area of that polygon which is a simple task. 

Most algorithms which discriminate text according to local shape features use the properties listed above. 
The algorithms which are based on local shape features have two major flaws: (1) they nr»y misclassify graphics 
as text (a circle may classify as "o") and (2) they can not detect abnomial strings (for example, they can not 
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detect a dashed line as graphics; instead, ach minus sign is detected as a character symbol and the whole 
string is detected as text). 

These flaws were fixed in a known Text-String Separation algorlthnrt but at a h^h price of proc ssing time. 
The clustering process of characters into strings takes most of the time. Th algorithm uses Hough transform 
5 to detect coliinear components and then groups them into words and phrases if they conform to some statistical 
pattern. The algorithm succeeds to classify abnormal strings as graphics, but is sensitive to parameter settings; 
a wrong selection may cause connected components which belong to one line to be grouped In different cells 
(undergrouping), or It may cause several parallel strings to be grouped into a single cell (over-grouping). The 
Hough transform may also mistakenly detect a group of vertical components as a vertical string although these 
10 components are part of horizontal text lines. 

Another difficulty is that strings which have arc-orientation (rather than linear orientation) are not discrim- 
inated as text The same happens with short isolated strings (strings containing less than three characters). 

All of the algorithms mentioned above fail to property discriminate between images which contain a large 
variety of font sizes. Moreover, they can not handle blocks of reversed text (reveraed text is white text over a 
15 black background). 

OBJECTS AND SUMMARY OF THE INVENTION 

An object of the present invention is to provide a novel method, having advantages in one or more of the 
20 above respects, for analyzing a source image to separate text from graphics. 

According to the present invention, there is provided a method of analyzing a source image to separate 
textfinom graphics, comprising a method of analyzing a source image to separate text from graphics, comprising 
(a) scanning and digitizing the source image to obtain a binary image including black and white objects; (b) 
filtering out the noise from the binary image to obtain a filtered binary image; (c) extracting the contours of the 
25 black objects and the white objects fix>m the filtered binary image; (d) evaluating inclusion relationships between 
the objects, and generating a tree-like structure of such relationships; (e) utilizing said contours for measuring 
the objects to obtain the shape properties of each object; (f) effecting classification of the objects as graphics 
or text acoording to the measured shape properties; and the generated trae-like structure of the inclusion re- 
lationships; (g) and utilizing said source image and said dassificatbn of the objects for generating outputs rep- 
30 resenting graphics and text, respectively. 

Acoording to further features in the preferred embodiment of the invention desribed below, in step (b), the 
noise is filtered out by dilation of black pixels; in step (e), the objects are measured in a top-down sequence, 
starting with the object at the root of a tree; and in step (c), extracting the contour of the black objects and the 
white objects from the filtered binary image is effected by a single scan in which a window is convolved with 
35 the filtered binary image in a raster feshion. In addition, the window scans the Image along a line and returns 
an Indicatbn of the type of pattern seen from the window and ari Indication of the center of the window; each 
type pattern is processed differently to determine whether a new object is started, continued or ended, all ob- 
jects intersecting the cunrent scan line being processed in parallel. 

In the described preferred embodiment, when a maximal point Is encountered during the window scan, It 
40 is considered to be a starting point of a new object, nut if later the scan indicates it was a maximal point of a 
previously indicated object, the new object is merged with that of the previously indicated object. 
Further features of the Invention will be apparent from the description below. 

BRIEF DESCRIPTION OF THE DRAWINGS 

45 

The invention is herein described, by way of example only, with reference to the accompanying drawings, 
wherein: 

Fig. 1 is an overall pictorial diagram illustrating one application of the method of the present invention; 
Fig. la illustrates a typical document including graphics and text in different sizes, orientations and 
so fonts,and the results of its being processed according to the present Invention; 

Fig. 2 is a flow diagram illustrating the main steps in a method of analyzing a source image to separate 
text from graphics in accordance with the present invention; 

Fig. 3 is a diagram illustrating the scanning and digitizing step (a) in the diagram of Fig. 2; 
Fig. 4 is a diagram illustrating the dilation method for filtering noise in accordance with step (b) of the flow 
S5 diagram of Fig. 2; 

Figs. 6a and 5b are flow diagrams illustrating one algorithm for perfonning step (b) In the flow diagram of 

Fig. 2; and Figs. 5c and 5d are diagrams helpful in underatending this step; 

Fig. 6a is a diagram illustrating the contour detection step (c) in the flow diagram of Fig. 2; and 
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Fig. 6b more particularly Illustrating one example of perfomiing that step; 

Figs. 7a and 7b are flow diagrams illustrating an algorithm which may be used for perfonming st p (c); 
Fig. 8 is a decision table used in the algorithm of Fig. 7 indicating how the different states are handled; 
Fig. 9 is a diagram illustrating the tree-generation step (d) in the flow diagram of Fig. 2; 
5 Fig. 10 is a flow diagram of one algorithm that may be used for perfbmning a polygonal approximation in 

the object measurement step (e) of Fig. 2; 

Figs. 1 1a and 1 1 b are flow diagrams illustrating one algorithm that may be used in performing the classi- 
fication step (f) of Fig. 2; and 

Fig. 12 is a flow diagram illustrating one algorithm for performing the output-generation step (g) in Fig. 2. 

10 

DESCRIPTION OF A PREFERRED EMBODIMENT 
Overall System 

IS Fig. 1 pictorially illustrates a method of analyzing a source document 2 in accordance with the present in- 
vention to separate textfrom graphics, the text being outputted in document4, and the graphics being outputted 
in document 6. For purposes of example and of showing the capability of the method, the source document 2, 
as shown in enlargement in Fig. la, includes graphics and text of different sizes, orientations and fonts. 
Thus, the source document 2 containing the source image of tx>th text and graphics is scanned by an o ptical 

20 scanner 8, and its output is fed to an image processing system, generally designaed 10, which includes an 
Image disc 12, a memory 14, and a CPU 16. The image processing system 10 outputs the process Information 
via a plotter 18 in the fonn of the two documents 4 and 6: document 4 contains the text of the original document 
2, and document 8 contains the graphics of the original document 2. 

Fig. 2 is a flow diagram illustrating seven basic steps (a-g), generally designated by blocks 21-27, per- 

25 fonned by the image processing system 1 0; as follows: 

(a) scans and digitizes the source image (document 2) to obtain a binary image including black and white 
objects (block 21); 

(b) filters out the noise from the binary Image to obtain a filtered binary image (block 22); 

(c) extracts the contours of the black objects and the white objects from the filtered binary image (block 
30 23); 

(d) evaluates the inclusion relationship between the objects and generates a tree-like structure of such re- 
lationship (block 24); 

(e) utilizes the contours detected in step c for measuring the objecte to obtain the shaped properties of each 
object (block 25); 

35 (0 classifies the objects as graphics or text according to the measured shaped properties and the inclusion 
relationship obtained in step d (block 26); and 

(g) generates, via the output plotter 18, outputs representing text (document 4) and graphics (document 
6), respectively (block 27). 

Following Is a more detailed description of each of the above steps: 

40 

Scanning and Digitizing (Block 21, Fig. 2) 

This step is effected to obtain a binary version of the source image. It may be earned out by an optical scan- 
ner, a CCD (charge-coupled device) scanner, etc. , to produce a binary file on disc or tape (e.g.. Image disc 1 2, 

45 Fig. 1) containing the bitmap representation of the source image. The bitmap can be a stream of bits with each 
bit corresponding to a black or a white pixel, or it can be encoded in runs. It will be assumed that a run-length 
coding is used, by which a sequence of black (or white) pixels are encoded by the colour with the length of the 
sequence being up to the next transition in colour. A typical resolution of scanning is 50 pixels/mm. 

Fig. 3 diagrammatically illustrates the scanning and digitizing step, wherein it will be seen that the source 

so image, as shown at 31 , is converted to a digitized bitmap representation of the source anage, as shown at 32. 
It will also be seen that the bitmap representation of the source image 32 in Fig. 3 includes image data 32a 
and noise 32b. 

Filtering Noise (Block 22, Fig. 2) 

55 

The second step perfonned by the image processing system 10 of Fig. 1, as shown in the block diagram 
of Fig. 2, is noise filtration, namely the removal of the noise signals 32b in the bitmap representation illustrated 
at 32 in Fig. 3. This step is carried out by a dilation operator which changes white pixel to black if its distance 
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from the nearest black pixel is below a pred fin d threshold. 

This step is more particularly shown in Fig. 4, wherein it will be seen that the image data before dilation, 
as shown at 41 , includes a number of isolated black pixels, 41a which are very dos to a group of black pixels 
41 b and which are absorbed to fonm a sing! group 42a after the dilation step as shown at 42. This operation, 
5 which widens the black pixels and therefore connects together isolated pixels, decreases significantly the num- 
ber of isolated black pixels which are in the surroundings of black objects. 

A simple dilation algorithm can be: Set an output pixel to be the conjuctive of all input pixels In its surround- 
ing. 

The dilated image 42 is intermediate and is used only to partition the Image roughly into regions of black 
10 and white objects. Later in the process, as will be described below, these regions will be classified, and the 
pbcels of the original image will be coloured property according to the dass in which they reside. 

Noise filtration by dilation provides two advantages: (a) it maintains the basic shape properties of the orig- 
inal objects; and (b) it fadliates the later determination as to which dass the black pixels in the original Image 
belong. 

15 Dilation can be achieved in many ways. When perfonmed on a bit map. It can be achieved by simple hard- 
ware or software; but when performed on a run-length coded image, it is more complicated. 

Preferably, in order to utilize the advantages of the run-length coding, a specific apparatus is used opeiBting 
according to the following algorithm, as Illustrated in the flow diagrams of Figs 5a and 5b, and also in Appendix 
A at the end of this specification. 

20 

Contour Detection (Block 23) 

In this step, the image obtained by the dilation,ls scanned in order to label the objects and to extract their 
contours. A contour of an object is defined as the chain of line segments which bBck the boundary of the object 
25 separating between black and white pixels, if the object is not solid (i.e., it contains holes), the contour of these 
holes is extracted as well. Therefore, an object may have more than one contour. 

Fig. 6a illustrates the contour extracting step, wherein it will be seen that the black object shown at 61 Is 
converted to the contour 62 constituted of a chain of line segments which track the boundary of the object 61 . 
Many algorithms are known for such chain generation in order to extract the contour. Some algorithms use 
30 a sequential approach, by which a contour is tracked from beginning to end before another contour is tracked. 
However, this aproach may result in many scans over the image, especially when the image contains many 
large objects, and therefore may take a considerable period of time. 

Preferably, a single scan approach is used in the method of the present Invention. In this approach, a 2 x 
2 window is convolved with the image in a raster fashion. The raster scan can again benefit firom the compact 
35 run-length coding since only locatbns of colour transitions need be examined Instead of the whole Image. 

The general idea of the one-scan approach is as follows: The window scans the image and returns an in- 
dication of the type of pattern seen from the window and an indication of the position of the center of the window. 
Each type of pattern is processed differently to detenr^lne whether a new object Is started, continued or ended. 
All objects intersected by the cunentscan line are processed In parallel. A new object always sterts at a maximal 
40 point and ends at minimal point, but not ail maximal points necessarily start new objects or do all minimal pointe 
always end existing objects. The minimal points makes no problem because by the time they are reached, suf- 
ficient infonmation is already at hand to determine whether or not they are true end points. However, with the 
maximal points, there Is a problem of ambiguity. At the time a maximal pint is encountered It cannot always be 
determined whether this point Is a local maximum of an existing object or a global maximum In a new object 
45 In the described process, a maximal point is always considered to be a starting point of a new object If 
later on it is discovered that it was a starting point of an existing object, the twoobjecte, the tme and the artifidal, 
are merged and the artificial object is deleted. 

At each maximal point two chains are started downwards, and at each minimal point two chains are con- 
nected. Therefore a contour is intially composed of more than one chain, and only when the object is ended 
so are the chains connected properiy to fonm one closed-loop contour. With each contour two pointers are con- 
neded to point at the two objecte on the right-hand and left-hand sides of the contour. These pointers enable 
later to extract the indusion relationship between the objects. 

Fig. 6b illustrates a particular case, in which contour 1 is composed of chains A-F, contour 2 is composed 
f chains 6-H, and contour 3 is composed of chains l-J. It will be seen that object 1 (background) is bounded 
55 by contours 1 and 3; object 2 is bounded by contours 1 and 2; object 3 is bounded by contour 2, and object 4 
is bounded by contour 3. 

Figs. 7 and 7a illustrate an example of an algorithm which may be used for this step; and Fig. 8 elaborates 
on the operations of blocks 71 and 72 in Fig. 7b and illustrates a dedston table for the different states. Appends 
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B at the end of this specification illustrates an example of an algorithm for this purpose. 

Tree Generation (Block 24) 

In this step, the inclusion relationship between the objects is evaluated, and a tree-like stmcture of such 
relationships is generated. This relationship is utilized at the time of classification, since it is sometimes impor- 
tant to have information about the objects included within one object in order to assign it a proper class. Thte 
relationship can be extracted easily from the data base of objects and contours produced in the previous step. 
All that is necessary is to set a pointer from each object to the object which includes it, namely to its predecessor. 
In that way, a tree-like stmcture is fomned. There is one object which has no predecessor, which Is usually the 
white background. 

The predecessor of an object may be found as follows: Assuming that the contours are always directed 
counter-clockwise, first find out which of the contours is the outmost (It being recalled that an object has more 
than one contour if it contains holes), and then set the pointer to point at the object on the right side of this 
contour. This object is the predecessor. 

Fig. 9 diagrammatically illustrates the step of determining the inclusion relattonship. Graph 92 in Fig. 9 is 
the tree-like structure obtained from the image 91. 

Object Measurements (Block 25) 

This involves measuring the objects to obtain the shape properties of each object. The following primitives 
are used: (a) area of the object (measured In pixels), (b) number of contours, and (c) perimeter length of each 
contour (measured in pixels). From these primitives, the following are determined: (a) elongation, (b) hull area, 
(c) hull eccentricity, (d) black/white ratio, (e) Euler number, and (f) number of sharp corners. 

Elongation measures the ratio between the width of the lines forming the object and the overall dimensions 
of the object Elongation Is computed as follows: 

Elongation = =?== 

P(P- Vp?-16A) 

OA 

where A is the area of the object, and P is the perimeter of the object 
Hull Is the convex polygon which bounds the object. There are fest algorithms which compute the convex 
hull for a given set of points. 

Hull eccentricity is the ratio between width and height of the hull. 
Black/white ratio is the ratio between the hull area and the area of the object . 

Euler number Indicates the number of holes in the object It is defined as one minus the number of holes. 

The number of sharp comers is computed as follows: first a polygonal approximation of the contours is 
generated. This approximation is generated several times, each time with a bigger error threshold. This is done 
as long as the number of polygon segments continues to drop linearly with respect to the increase of the error 
threshold. The last approximation is used for the evaulatbn of the number of sharp comers. A sharp corner Is 
a corner in the approximating polygon which has an angle of less than 95 degrees. 

Fig. 1 0 is a flow chart illustrating one algorithm that may be used for performing a polygonal approximation 
operation in the object measurement step (e). 

Object Classification (Block 26) 

This step involves classifying the objects as graphics or text. In this step, the objects are traversed in a 
bottom-up fashion and are classified according to the measurennents taken in the previous step, and according 
to the classes that were given to the successive objects in the tree. The classificatfon is done according to a 
set of predefined rules and thresholds. Appendbc C is an example of such rules and thresholds as illustrated 
in the flow diagrams of Figs. 11a and lib.. 

Output Generation (Block 27) 

This step involves generating outputs representing text and graphics, as Illustrated by documents 4 and 
6, respectively, Fig. 1. 

In this step, the original imag is read again and written back in different colours. White pixels remain white, 
but black pixels change according to the class of the object to which they reside (each class is assigned a dif- 
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ferent colour). Two adjac nt black pixels are never painted in different colours because the dilation op ration 
prev nts them from b ing associated with different objects; therefore, it prevents them from having different 
classes, and thus different colours. 

After the black pixels are repainted, the whole process can be repeated for the white pixels. That is, if it is 
necessary to discrimate between the various white objects, the steps of blocks 21-27 of the flow chart of Fig. 
2 should be carried out again, but this time step 2 (block 22), namely the dilation step, should be performed on 
the white pixels and not on the black pixels. 

The problem of output generation is actually reduced to the problem of finding for each black pixel the object 
in which it resides. This object directly defines the class, and the class defines the new colour for that pixel. 

One algorithm which may be used for output generation, as illustrated in Fig. 12, Is illustrated In the attached 
Appendix D. 

The invention has been described with respect to one preferred embodiment, but it wfll be appreciated that 
many variations and other applications of the invention may be made. 

While the flow diagram of Fig. 2 illustrates the steps as perfomied sequentially, such steps can be, and 
preferably are, performed In pipeline fashkm. Thus, during scanning via the input window, as soon as an end 
of an object is detenrtined, processing of the output of the object can be started from the highest line of the 
object. 



EP 0 516 576 A2 



APPENDIX A 

The following is an algorithm for dilation of run-length coded image. 

d - distance threshold. 

line, - input liae number L 

line.' - output line number i, 

sirip^ - batch of 2<i+l lines (Une.^, Une.^. 



1. initialize brush-vector: b[i] <~ 

2. initialize lines-counter: y <— 0 

3. clear /me. , —d^i<0 

4. read first d lines into /i/iCp, lincp ... , /me^^^ 

5. while not end-of-file do 

6. clear linej 

7. read next input line into UnCj^^ 

8. partition stripj into patterns: Pj, /^^^ » 

9. for each P^, \<.k^n do 

10. if is not totally WHirE (not zero) then 

11. set LEfT, RIGHT to mark the left and right margins of 

12. find minimal I i I for which P^[z] is BLACK 

13. insert black run [LEFr-b[i\, RiGHT-^bii]] into linej 

14. end 

15. end 

16. output linef 

17. y <~ y + 1 

18. end 

A pattem is a slice in a strip that contains 2d+l line segments which 
start and end at the same coordinate (see figure 5c). A pattem is maximal 
in the sense that it is the widenst slice that contains no color transitions 
along the line segments which constitutes the pattem. is the color of 
the /'th line-segment in P. (see figure 5d). 
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APPENDIX B 

Input: image given run-length fonnat. 
Output: list of objects and chains. 

An object contains: 

a. color-code describing the color of the object 

b. area of object (number of pixels) 

c. pointers to the chains which partition the contour of the object. 
A chain contains: 

a. chain-code describing a segment of contour. 

b. lengtii of chain (number of links) 

c. pointers to the objects on both sides of the chain. 

The algorithm uses the following variables: 
x,y - pointers to current scan location. 
lineO, linel - holds the contents of two successive input lines. 
gchains - list of "growing" chains. 
ckainp - pointer to chain in gchains. 

L frame the image (frame widfli = 1, frame color == WHITE) 

2. init objects, chains, gchains 

3. y<-0 

4. read first line into lineO 

5. while y < height-of-image do: 

6. x^O 

7. read next line into linel 

8. set chainp to point at first chain in gchains Ust 
9- while X < width-of-image do: 

10. advance x to coordinate of next run 

11. identify state of colors in the 2x2 window centered at (x,y) 

12. handle this state (see figure 8) 

13. end 

lA, y<r-y + l 

15. lineO <— linel 

16. end 

At step 1, the framing process can be done concurrently at the time the 
image is read (steps 4 and 7). At step 10, variable x is advanced to the 
coordinate of minimal-run offset, thus, no mn is skipped. Each mn is pro- 
cessed twice - once as being a member of lineO and once as a member of 
linel. 
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APPENDIX C 

The following is an algorithm for object classification. 

1. class f- TEXT 

2. if area < CI resolution^ then class <- NOISE 

3. if object has no predecesor then class <- BACKGROUND 

4. if Euler number < C2 then class <r- BACKGROUND 

5. if elongation > C3 then class <r- GRAPHICS 

6. if sharp corners > C4 then class ^ GRAPHICS 

7. if B/W ratio < C5 then class <- GRAPHICS 

8. if elongation > C6 then class «- VOID 

9. if eccentricity > C7 then da« <- VOID 

10. if c/as^ is BACKGROUND then 

11. for each succesor object - sobj do: 

12. if class of is TEXT then 

13. change the class of all succesors of sobj 
to BACKGROUND 

14. end 



Constant 



Value 



CI 
C2 
C3 
C4 
C5 
C6 
C7 



0.25 

-8 

90 

20 
0.05 

25 

15 
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APPENDIX D 

The following is an algorithm for output generation, 

/ - length of run. 

c - color of mn, 

x,y - coordinates in the image. 

1. y<-0 

2, while not cnd-of-file do 



3. y y + 1 

4. read next line from source file 

5. ;c 0 

6. while more runs in line do 

7. X ^ staring position of next run 

8. / length of this run 

9. c color of this run 

10. ifCTfeWHTTEdO 

11- find the object ^6; which contains point {x,y) 

12. class <— obj, class 

13. c 4- color[class] 

14. endif 

15. output run of / pixels in color c 

16. end 



17. end 



In step 11 the object which contains point (x,y) is searched. This search 
can be simplified greatly if the chains discovered in the phase of contour 
detection are recorded to be used in this phase. These chains always start 
at maximal points and progress in pairs downwards. Therefore, the algo- 
rithm can track while progressing downwards in die file, which pairs of 
chains are active at a specific line and use this knowledge to find the ob- 
jects which spans between each pair of chains. 

This knowledge can even be used to make the process possible in pipe- 
line. The output generation will be triggered by signals from the 
classification module that a new object is completely discovered and 
classified. 



12 



EP 0 516 576 A2 



Claims 



2. 
3. 



6. 



A method of analyzing a source image to separate text from graphics, comprising: 

(a) scanning and digitizing the sourc image to obtain a binary image including black and white objects- 

(b) filtenng out the noise from the binaiy Image to obtain a filteiBd binary Image; 
(o) extracting the contours of the black objects and the white objects from the filtered binary image- 

(d) evaluafang inclusion relattonships between the objects, and generating a tiBe-like structure of such 
relationships; 

(e) utOizing said contours for measuring the objecb to obtain the shape properties of each object- 

(f) eftecbng classification of the objects as graphics or text acooiding to the measured shape properties 
and then generating tree-like structure of the inclusion relationships; 

(9) and utilizing said source image and said classificatton of the objects for generating outputs repre- 
senbng graphics and text, respectively. 

The method according to Claim 1 , wherein in step (b). the noise Is fHtered out by dilation of the black pixels. 

The method according to either of Claims 1 or 2. wherein in step (e). the objects are measured in a top- 
down sequence, starting with the object at the root of the tree. 

The method accxjrding to any one of Claims 1-3, wherein in step (c). extracting the contour of the black 
objects and the white objects from the filtered binary image is effected by a single scan in whteh a window 
IS convolved with the filtered binary image in a raster fashion. 

The method according to Qaim 4, wherein the window scans the image along a line and returns an indi- 
cation of the type of pattern seen from the window and an indication of the center of the vWndow. each 
type pattern being processed differently to detennine whether a new object is started, continued or ended 
an objects intersecting the current scan line being processed in parallel. 

The method according to Qaim 5. wherein a maximal point encountered during the window scan is con- 
sidered to be a starting point of a new object, but if later the scan indicates it was a maximal point of a 
previously indicated object, the new object is merged with that of the previously indicated object. 

The method according to any one of Claims 1-6, wherein in step (d). the Iree-like stmcture is generated 
by setbng a pointer from each object to ite predecessor, the predecessor of an object being found by de- 
termining which of the object contoure Is the outemrtost one, and then setting the pointer to point at the 
object on one side of that contour. » i* k 

The rr«thod according to any one of Claims 1-7. wherein in step (e), the objects are measured to obtain 
the foltowing shape properties of each object area of the object, number of contours, and perimeter tength 
of each contour. " 

The iiiethod according to Oaim 8, wherein in step (e), the following additional properties are determined 
from the measured shape properties: etongatton, hull area, hull eccentricity, biack^Mhite ratio, Eulernum- 
ber, and number of sharp comers. 

10. The method according to Qaim 9. wherein the number of sharp comere is detennined by- 

generating several polygonal approximations of the contour, with each generatnn having a bigger 
error threshold, as long as the number of polygon segmente drops linsaily with respect to the increase In 
the error threshokl; 

and detemtining that a sharp comer existe when the last polygon approximation has an angle of 

less than 60°. 

11. The method according to any one of Claims MO, wherein In step (g), the generated outputs represeting 
graphics and texts are In the form of different images. 

12. The method according to any ne of Claims 1-10. wherein in step (g). the generated outputs representing 
graphics and texts are in the form of different ooloura of the same image. 

13. The method according to any one of Qalms 1-10, wherein the source image contains text of different sizes 
onentat nsand/or fonts. 



9. 
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14. The method according to Claim 2, wh rein steps (a)-(g) are repeated, except the noise is filtered out in 
step (b) by dilation of the white pixels, so that white objects of the source image are separated, thereby 
providing discrimination of white text and graphics over black baclcground. 

5 15. The method according to any one of Claims 1-10, wherein the source Image contains blacic text, white 
text, black graphics, white graphics, black and white bacl^round, and black and white noise. 
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